Skip to content

feat: SeFi-Image support#1707

Open
fszontagh wants to merge 1 commit into
leejet:masterfrom
fszontagh:feat/sefi-image-prototype
Open

feat: SeFi-Image support#1707
fszontagh wants to merge 1 commit into
leejet:masterfrom
fszontagh:feat/sefi-image-prototype

Conversation

@fszontagh

@fszontagh fszontagh commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds inference support for SeFi-Image, a dual-time flow-matching T2I family built on the Flux2 backbone with a Qwen3-VL text encoder. Tech report: arXiv:2606.22568. See docs/sefi_image.md.

What's in:

  • VERSION_SEFI_IMAGE + version detection
  • Dual-time embedding block (semantic_embedder + texture_embedder, concat)
  • Per-stream Euler sampler with alpha-shift + delta_t
  • SeFi-aware Qwen3-VL conditioning (chat template, layers 9/18/27)
  • VAE BN normalization on packed texture latents
  • script/convert_sefi.py for converting diffusers checkpoint to single sd.cpp safetensors
  • --extra-sample-args sefi_alpha=0.3 / sefi_delta_t=0.1 overrides
  • Filename heuristic: turbo in path => alpha=1.0, else alpha=0.3

Related Issue / Discussion

Closes #1702.

Additional Information

Example

./build/bin/sd-cli \
  --model /path/to/sefi_1b_turbo.safetensors \
  --llm   /path/to/qwen3_vl_2b.safetensors \
  -p "a photograph of an orange tabby cat sitting on a couch" \
  --cfg-scale 1.0 --steps 4 -W 1024 -H 1024 -s 42 \
  --diffusion-fa --offload-to-cpu \
  -o out.png

Tested variants (all 7 from huggingface.co/SeFi-Image)

Variant Encoder Baseline (12GB VRAM) --max-vram 8 --stream-layers
1B-Base qwen3_vl_2b ok 109s ok 172s
1B-turbo qwen3_vl_2b ok 14s ok 17s
2B-Base qwen3_vl_2b ok 229s ok 296s
2B-turbo qwen3_vl_2b ok 29s ok 25s
5B-Base qwen3_vl_4b OOM ok 563s
5B-turbo qwen3_vl_4b OOM ok 170s
5B-RL qwen3_vl_4b OOM ok 587s

5B variants use Qwen3-VL-4B-Instruct as the text encoder (1B/2B use 2B). 5B needs streaming on 12GB-class GPUs.

Checklist

@GreenShadows

Copy link
Copy Markdown

The quality seems surprisingly good for such a small model.

Comment thread src/model/vae/auto_encoder_kl.hpp Outdated
Comment thread src/stable-diffusion.cpp Outdated
Comment thread src/stable-diffusion.cpp Outdated
Comment thread src/stable-diffusion.cpp Outdated
Comment thread README.md Outdated
Comment thread src/name_conversion.cpp Outdated
@sz1kormar

Copy link
Copy Markdown

Speeds are amazing on turbo given that you only need 4 steps and 1 cfg to pull it off. @fszontagh could you attach some of your images here to show off SeFi-Image as an independent tester ?

@fszontagh

Copy link
Copy Markdown
Contributor Author

Speeds are amazing on turbo given that you only need 4 steps and 1 cfg to pull it off. @fszontagh could you attach some of your images here to show off SeFi-Image as an independent tester ?

I started the changes which are required by leejet. After i will drop some images here.

@JohnLoveJoy

Copy link
Copy Markdown
image image

It looks really good. These are images I got from Reddit.

@fszontagh

fszontagh commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author
sefi_refactored
./build/bin/sd-cli \
  --diffusion-model /data/SD_MODELS/diffusion_models/sefi/sefi_5b_turbo.safetensors \
  --vae /data/SD_MODELS/vae/flux2_ae_from_sefi.safetensors \
  --llm /data/SD_MODELS/Text-encoder/sefi/qwen3_vl_4b.safetensors \
  -p "a photograph of an orange tabby cat sitting on a couch" \
  --cfg-scale 1.0 --steps 4 -W 1024 -H 1024 -s 42 \
  --extra-sample-args sefi_alpha=1.0 \
  --diffusion-fa --max-vram 8 --stream-layers --offload-to-cpu \
  -o /tmp/sefi_refactored.png

@fszontagh

Copy link
Copy Markdown
Contributor Author

Pushed the rework. Per-comment:

  • auto_encoder_kl.hpp:516 (bn) - confirmed SeFi uses the standard Flux2 VAE. The bn.running_mean / bn.running_var weights match the hardcoded get_latents_mean_std constants within bf16. Dropped the SeFi-specific bn.* params, sefi_bn_apply, and the encode/decode branches. SeFi now uses the Flux2 VAE file directly (flux2_ae.safetensors); convert_sefi.py no longer emits a VAE file. diffusion_to_vae_latents still slices the 16 semantic channels before the standard Flux2 denormalize.
  • stable-diffusion.cpp:1341 (denoiser fork) - added SEFI_FLOW_PRED to the prediction_t enum. SeFi version maps to it. FLUX2_FLOW_PRED case is now version-agnostic; SEFI_FLOW_PRED case constructs SefiFlowDenoiser.
  • stable-diffusion.cpp:1348 (turbo knob) - removed the filename heuristic. Default timestep_shift_alpha = 1.0 (identity); base/RL pass --extra-sample-args sefi_alpha=0.3. No kAlphaTurbo / kAlphaBase constants left.
  • stable-diffusion.cpp:2084 (dual-time override) - moved into process_timesteps. process_timesteps now takes a step arg; the SeFi branch returns {sem_timesteps[step], tex_timesteps[step]}. Sample loop is back to a single process_timesteps call.
  • README.md:18 (Important news) - removed the SeFi-Image line.
  • name_conversion.cpp:1210 (prefixes) - removed the backbone. / dual_time_embed. prefix injection. convert_sefi.py already emits canonical model.diffusion_model.* keys.

Smoke matrix after rework: 5B-turbo (82 s), 5B-Base (510 s), 5B-RL (567 s) all produce the same orange tabby as before. Within ~3% of pre-rework wallclock.

@fszontagh fszontagh force-pushed the feat/sefi-image-prototype branch from 63ef957 to f27063d Compare June 25, 2026 18:58
@fszontagh

fszontagh commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

5b rl
sefi_5b_rl_v2
5b base
sefi_5b_base_v2

a lovely cat holding a sign says 'SeFi.cpp' with 5b turbo: (~148s)

 ./build/bin/sd-cli \
    --diffusion-model /data/SD_MODELS/diffusion_models/sefi/sefi_5b_turbo.safetensors \
    --vae /data/SD_MODELS/vae/flux2_ae_from_sefi.safetensors \
    --llm /data/SD_MODELS/Text-encoder/sefi/qwen3_vl_4b.safetensors \
    -p "a lovely cat holding a sign says 'SeFi.cpp'" \
    --cfg-scale 1.0 --steps 4 -W 1024 -H 1024 -s 42 \
    --extra-sample-args sefi_alpha=1.0 \
    --diffusion-fa --max-vram 8 --stream-layers --offload-to-cpu \
    -o /tmp/sefi_sign.png
sefi_sign

a lovely cat holding a sign says 'SeFi.cpp' with 5b base: (~540s)

./build/bin/sd-cli \
  --diffusion-model /data/SD_MODELS/diffusion_models/sefi/sefi_5b_base.safetensors \
  --vae /data/SD_MODELS/vae/flux2_ae_from_sefi.safetensors \
  --llm /data/SD_MODELS/Text-encoder/sefi/qwen3_vl_4b.safetensors \
  -p "a lovely cat holding a sign says 'SeFi.cpp'" \
  --cfg-scale 4.0 --steps 50 -W 1024 -H 1024 -s 42 \
  --extra-sample-args sefi_alpha=0.3 \
  --diffusion-fa --max-vram 8 --stream-layers --offload-to-cpu \
  -o /tmp/sefi_sign_base.png
sefi_sign_base

Comment thread script/convert_sefi.py
Comment thread src/model/diffusion/flux.hpp Outdated
Adds inference support for SeFi-Image (https://huggingface.co/SeFi-Image),
a dual-time flow-matching T2I family built on the Flux2 backbone with a
Qwen3-VL text encoder. Tech report: https://arxiv.org/abs/2606.22568.

- VERSION_SEFI_IMAGE + SEFI_FLOW_PRED + version detection from weights
- Dual-time embedding block (semantic + texture, concat)
- SefiFlowDenoiser with alpha-shift + delta_t, dual-time override fed
  via process_timesteps; alpha exposed as --extra-sample-args sefi_alpha
- Qwen3-VL conditioning (chat template, layers 9/18/27)
- Reuses standard Flux2 VAE; semantic channels sliced in
  diffusion_to_vae_latents before the existing get_latents_mean_std path
- script/convert_sefi.py emits transformer-only safetensors with
  canonical model.diffusion_model.* keys; VAE comes from flux2_ae
@fszontagh fszontagh force-pushed the feat/sefi-image-prototype branch from f27063d to 0934a5a Compare June 26, 2026 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] SeFi-Image-5B-turbo

5 participants